Feature Selection for Descriptor Based Classification Models. 2. Human Intestinal Absorption (HIA)

نویسندگان

  • Jörg K. Wegner
  • Holger Fröhlich
  • Andreas Zell
چکیده

We show that the topological polar surface area (TPSA) descriptor and the radial distribution function (RDF) applied to electronic and steric atom properties, like the conjugated electrotopological state (CETS), are the most relevant features/descriptors for predicting the human intestinal absorption (HIA) out of a large set of 2934 features/descriptors. A HIA data set with 196 molecules with measured HIA values and 2934 features/descriptors were calculated using JOELib and MOE. We used an adaptive boosting algorithm to solve the binary classification problem (AdaBoost.M1) and Genetic Algorithms based on Shannon Entropy Cliques (GA-SEC) variants as hybrid feature selection algorithms. The selection of relevant features was applied with respect to the generalization ability of the classification model, avoiding a high variance for unseen molecules (overfitting).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effect of Molecular Descriptor Feature Selection in Support Vector Machine Classification of Pharmacokinetic and Toxicological Properties of Chemical Agents

Statistical-learning methods have been developed for facilitating the prediction of pharmacokinetic and toxicological properties of chemical agents. These methods employ a variety of molecular descriptors to characterize structural and physicochemical properties of molecules. Some of these descriptors are specifically designed for the study of a particular type of properties or agents, and thei...

متن کامل

Prediction of Human Intestinal Absorption by GA Feature Selection and Support Vector Machine Regression

QSAR (Quantitative Structure Activity Relationships) models for the prediction of human intestinal absorption (HIA) were built with molecular descriptors calculated by ADRIANA.Code, Cerius(2) and a combination of them. A dataset of 552 compounds covering a wide range of current drugs with experimental HIA values was investigated. A Genetic Algorithm feature selection method was applied to selec...

متن کامل

MULTI CLASS BRAIN TUMOR CLASSIFICATION OF MRI IMAGES USING HYBRID STRUCTURE DESCRIPTOR AND FUZZY LOGIC BASED RBF KERNEL SVM

Medical Image segmentation is to partition the image into a set of regions that are visually obvious and consistent with respect to some properties such as gray level, texture or color. Brain tumor classification is an imperative and difficult task in cancer radiotherapy. The objective of this research is to examine the use of pattern classification methods for distinguishing different types of...

متن کامل

Statistical Confidence for Variable Selection in QSAR Models via Monte Carlo Cross-Validation

A new variable selection wrapper method named the Monte Carlo variable selection (MCVS) method was developed utilizing the framework of the Monte Carlo cross-validation (MCCV) approach. The MCVS method reports the variable selection results in the most conventional and common measure of statistical hypothesis testing, the P-values, thus allowing for a clear and simple statistical interpretation...

متن کامل

ADME Evaluation in Drug Discovery, 8. The Prediction of Human Intestinal Absorption by a Support Vector Machine

Human intestinal absorption (HIA) is an important roadblock in the formulation of new drug substances. In silico models for predicting the percentage of HIA based on calculated molecular descriptors are highly needed for the rapid estimation of this property. Here, we have studied the performance of a support vector machine (SVM) to classify compounds with high or low fractional absorption (%FA...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of chemical information and computer sciences

دوره 44 3  شماره 

صفحات  -

تاریخ انتشار 2004